home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1993
/
Internet Info CD-ROM (Walnut Creek) (1993).iso
/
inet
/
internet-drafts
/
draft-ietf-wnils-whois-01.txt
< prev
next >
Wrap
Text File
|
1993-04-05
|
20KB
|
467 lines
WNILS Working Group Chris Weider
INTERNET-DRAFT Merit Network, Inc.
Jim Fullton
CNIDR
Simon Spero
3/26/93 UNC Chapel Hill
Architecture of the Whois++ Index Service
Status of this memo:
The authors describe an architecture for indexing in distributed databases,
and apply this to the WHOIS++ protocol.
This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas,
and its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts.
Internet Drafts are draft documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted
by other documents at any time. It is not appropriate to use
Internet Drafts as reference material or to cite them other than
as a "working draft" or "work in progress."
Please check the I-D abstract listing contained in each Internet
Draft directory to learn the current status of this or any
other Internet Draft.
This Internet Draft expires October 1, 1993.
1. Purpose:
The WHOIS++ directory service [Deutsch, et al, 1992] is intended to provide
a simple, extensible directory service predicated on a template-based
information model and a flexible query language. This document describes
an architecture designed to link together many of these WHOIS++ servers
into a distributed, searchable wide area directory service.
2. Scope:
This document details a distributed, easily maintained architecture for
providing a unified index to a large number of distributed WHOIS++
servers. This architecture can be used with systems other than WHOIS++ to
provide a distributed directory service which is also searchable.
3. Motivation and Introduction:
It seems clear that with the vast amount of directory information potentially
available on the Internet, it is simply unfeasible to build a centralized
directory to serve all this information. Therefore, we should look at building
a distributed directory service. If we are to distribute the directory service,
the easiest (although not necessarily the best) way of building the directory
service is to build a hierarchy of directory information collection agents.
In this architecture, a directory query is delivered to a certain agent
in the tree, and then handed up or down, as appropriate, so that the query
is delivered to the agent which holds the information which fills the query.
This approach has been tried before, most notably in some implementations of
the X.500 standard. However, there are number of major flaws with the approach
as it has been taken. This new Index Service is designed to fix these flaws.
WNILS Working Group Whois++ Index Service Weider, et al.
3.1 The search problem
One of the primary assumptions made by recent implementations of distributed
directory services is that every entry resides in some location in a hierarch-
ical name space. While this arrangement is ideal for reading the entry once
one knows its location, it is not as good when one is searching for the location
in the namespace of those entries which meet some set of criteria. If the only
criteria we know about a desired entry are items which do not appear in the
namespace, we are forced to do a global query. Whenever we issue a global
query (at the root of the namespace), or a query at the top of a given subtree
in the namespace, that query is replicated to _all_ subtrees of the starting
point. The replication of the query to all subtrees is not necessarily a
problem; queries are cheap. However, every server to which the query has been
replicated must process that query, even if it has no entries which match
the specified criteria. This part of the global query processing is quite
expensive. A poorly designed namespace or a thin namespace can cause the
vast majority of queries to be replicated globally, but a very broad
namespace can cause its own navigation problems. Because of these problems,
search has been turned off at high levels of the X.500 namespace.
3.2 The location problem
With global search turned off, one must know in advance how the name space is
laid out so that one can guide a query to a proper location. Also, the layout
of the namespace then becomes critical to a user's ability to find the
desired information. Thus there are endless battles about how to lay out the
name space to best serve a given set of users, and enormous headaches whenever
it becomes apparent that the current namespace is unsuited to the current
usages and must be changed (as recently happened in X.500). Also, assuming
one does impose multiple hierarchies on the entries through use of the
namespace, the mechanisms to maintain these multiple hierarchies in X.500 do
not exist yet, and it is possible to move entries out from under their
pointers. Also, there is as yet no agreement on how the X.500 namespace
should look even for the White Pages types of information that is currently
installed in the X.500 pilot project.
3.3 The Yellow Pages problem
Current implementations of this hierarchical architecture have also been
unsuited to solving the Yellow Pages problem; that is, the problem of
easily and flexibly building special-purpose directories (say of molecular
biologists) and of automatically maintaining these directories once they have
been built. In particular, the attributes appropriate to the new directory
must be built into the namespace because that is the only way to segregate
related entries into a place where they can be found without a global
search. Also, there is a classification problem; how does one adequately
specify the proper categories so that people other than the creator of the
directory can find the correct subtree? Additionally, there is the problem
of actually finding the data to put into the subtree; if one must traverse
the hierarchy to find the data, we have to look globally for the proper
entries.
3.4 Solutions
We'll hold off for a moment on describing the actual architecture used in
our solution to these problems and concentrate on a high level description of
what solutions are provided by our conceptual approach. To begin with,
although every entry in WHOIS++ does indeed have a unique identifier
(resides in a specific location in the namespace) the navigational algorithms
to reach a specific entry does not necessarily depend on the identifier the
entry has been assigned. The Index Service gets around the namespace and
WNILS Working Group Whois++ Index Service Weider, et al.
hierarchy problems by creating a directory mesh on top of the entries.
Each layer of the mesh has a set of 'forward knowledge' which indicates the
contents of the various servers at the next lower layer of the mesh. Thus
when a query is received by a server in a given layer of the mesh, it can
prune the search tree and hand the query off to only those lower level servers
which have indicated that they might be able to answer it. Thus search becomes
feasible at all levels of the mesh. In the current version of this architecture,
we have chosen a certain set of information to hand up the mesh as forward
knowledge. This may or may not be exactly the set of information required to
construct a truly searchable directory, but the protocol itself doesn't
restrict the types of information which can be handed around.
Another benefit provided by the mesh of index servers is that since the
entry identification scheme has been decoupled from the navigation service,
multiple hierarchies can be built and easily maintained on top of the
existing data. Also, the user does not need to know in advance where in the
mesh the entry is contained.
Also, the Yellow Pages problem now becomes tractable, as the index servers
can pick and choose between information proffered by a given server;
because we have an architecture that allows for automatic polling of data,
special purpose directories become easy to construct and to maintain.
4. Components of the Index Service:
4.1 WHOIS++ servers
The whois++ service is described in [Deutsch, et al, 1992]. As that service
specifies only the query language, the information model, and the server
responses, whois++ services can be provided by a wide variety of databases
and directory services. However, to participate in the Index Service, that
underlying database must also be able to generate a 'centroid', or some other
type of forward knowledge, for the data it serves.
4.2 Centroids as forward knowledge
The centroid of a server is comprised of a list of the templates and
attributes used by that server, and a word list for each attribute.
The word list for a given attribute contains one occurrence of every
word which appears at least once in that attribute in some record in that
server's data, and nothing else.
For example, if a whois++ server contains exactly three records, as follows:
Record 1 Record 2
Template: User Template: User
First Name: John First Name: Joe
Last Name: Smith Last Name: Smith
Favourite Drink: Labatt Beer Favourite Drink: Molson Beer
Record 3
Template: Domain
Domain Name: foo.edu
Contact Name: Mike Foobar
WNILS Working Group Whois++ Index Service Weider, et al.
the centroid for this server would be
Template: User
First Name: Joe
John
Last Name: Smith
Favourite Drink: Beer
Labatt
Molson
Template: Domain
Domain Name: foo.edu
Contact Name: Mike
Foobar
It is this information which is handed up the tree to provide forward knowledge.
As we mention above, this may not turn out to be the ideal solution for
forward knowledge, and we suspect that there may be a number of different
sets of forward knowledge used in the Index Service. However, the directory
architecture is in a very real sense independent of what types of forward
knowledge are handed around, and it is entirely possible to build a
unified directory which uses many types of forward knowledge.
4.3 Index servers and Index server Architecture
A whois++ index server collects and collates the centroids (or other forward
knowledge) of either a number of whois++ servers or of a number of other index
servers. An index server must be able to generate a centroid for the
information it contains.
4.3.1 Queries to index servers
An index server will take a query in standard whois++ format, search its
collections of centroids, determine which servers hold records which may fill
that query, and then either a) forward the query to the appropriate servers
on behalf of the user, or b) notify the user's client of the next servers
to contact to submit the query.
4.3.2 Index server distribution model and centroid propogation
The diagram on the next page illustrates how a mesh of index servers is
created for a set of whois++ servers.
WNILS Working Group Whois++ Index Service Weider, et al.
whois++ index index
servers servers servers
for for
whois++ lower-level
servers index servers
_______
| |
| A |__
|_______| \ _______
\----------| |
_______ | D |__ ______
| | /----------|_______| \ | |
| B |__/ \----------| |
|_______| | F |
/----------|______|
/
_______ _______ /
| | | |-
| C |--------------| E |
|_______| |_______|-
\
\
_______ \ ______
| | \----------| |
| G |--------------------------------------| H |
|_______| |______|
Figure 1: Sample layout of the Index Service mesh
_______________________________________________________________________________
In the portion of the index tree shown above, whois++ servers A and B hand their
centroids up to index server D, whois++ server C hands its centroid up to
index server E, and index servers D and E hand their centroids up to index
server F. Servers E and G also hand their centroids up to H.
The number of levels of index servers, and the number of index servers at each
level, will depend on the number of whois++ servers deployed, and the response
time of individual layers of the server tree. These numbers will have to
be determined in the field.
4.3.4 Centroid propogation and changes to centroids
Centroid propogation is initiated by an authenticated POLL command (sec. 5.2).
The format of the POLL command allows the poller to request the centroid of
any or all templates and attributes held by the polled server. After the
polled server has authenticated the poller, it determines which of the
requested centroids the poller is allowed to request, and then issues a
CENTROID-CHANGES report (sec. 5.3) to transmit the data. When the poller
receives the CENTROID-CHANGES report, it can authenticate the pollee to
determine whether to add the centroid changes to its data. Additionally, if
a given pollee knows what pollers hold centroids from the pollee, it can
signal to those pollers the fact that its centroid has changed by issuing
a DATA-CHANGED command. The poller can then determine if and when to
issue a new POLL request to get the updated information. The DATA-CHANGED
command is included in this protocol to allow 'interactive' updating of
critical information.
WNILS Working Group Whois++ Index Service Weider, et al.
4.3.5 Query handling and passing algorithms
When an index server receives a query, it searches its collection of centroids,
and determines which servers hold records which may fill that query. As
whois++ becomes widely deployed, it is expected that some index servers
may specialize in indexing certain whois++ templates or perhaps even
certain fields within those templates. If an index server obtains a match
with the query _for those template fields and attributes the server indexes_,
it is to be considered a match for the purpose of forwarding the query.
There are two methods of forwarding a query, called 'chaining' and 'referral'.
4.3.5.1 Query referral
Query referral is the process of informing a client which servers to contact
next to resolve a query. The syntax for notifying a client is outlined in
section 5.5.
4.3.5.2 Query chaining
Query chaining is done when the queried index server takes responsibility for
resubmitting the query to the appropriate lower servers. The server
will then forward the query using the syntax in section 5.4, but then takes
no further responsibility for the query. A whois++ query can specify the
'trace' option, which causes each server which receives the query to
send its IANA handle and an identification string to the client.
5. Syntax for operations of the Index Service:
5.1 Data changed syntax
The data changed template look like this:
DATA-CHANGED:
Version-number: // version number of index service software, used to insure
// compatibility
Time-of-latest-centroid-change: // time stamp of latest centroid change, GMT
Time-of-message-generation: // time when this message was generated, GMT
Server-handle: // IANA unique identifier for this server
Best-time-to-poll: // For heavily used servers, this will identify when
// the server is likely to be lightly loaded
// so that response to the poll will be speedy, GMT
Authentication-type: // Type of authentication used by server, or NONE
Authentication-data: // data for authentication
END DATA-CHANGED // This line must be used to terminate the data changed
// message
5.2 Polling syntax
POLL:
Version-number: // version number of poller's index software, used to
// insure compatibility
Start-time: // give me all the centroid changes starting at this time, GMT
End-time: // ending at this time, GMT
Template: // a standard whois++ template name, or the keyword ALL, for a
// full update.
Field: // used to limit centroid update information to specific fields,
// is either a specific field name, a list of field names,
// or the keyword ALL
Server-handle: // IANA unique identifier for the polling server.
// this handle may optionally be cached by the polled
// server to announce future changes
WNILS Working Group Whois++ Index Service Weider, et al.
Authentication-type: // Type of authentication used by poller, or NONE
Authentication-data: // Data for authentication
END POLL // This line must by used to terminate the poll message
5.3 Centroid change report
CENTROID-CHANGES:
Version-number: // version number of pollee's index software, used to
// insure compatibility
Start-time: // change list starting time, GMT
End-time: // change list ending time, GMT
Server-handle: // IANA unique identifier of the responding server
Authentication-type: // Type of authentication used by pollee, or NONE
Authentication-data: // Data for authentication
Compression-type: // Type of compression used on the data, or NONE
Size-of-compressed-data: // size of compressed data if compression is used
Operation: // One of 3 keywords: ADD, DELETE, FULL
// ADD - add these entries to the centroid for this server
// DELETE - delete these entries from the centroid of this
// server
// FULL - the full centroid as of end-time follows
Multiple occurrences of the following block of fields:
Template: // a standard whois++ template name
Field: // a field name within that template
Data: // the word list itself, one per line, cr/lf terminated
end of multiply repeated block
END CENTROID-CHANGES // This line must be used to terminate the centroid
// change report
5.4 Forwarded query
FORWARDED-QUERY:
Version-number: // version number of forwarder's index software, used to
// insure compatibility
Forwarded-From: // IANA unique identifier of the server forwarding query
Forwarded-time: // time this query forwarded, GMT (used for debugging)
Trace-option: // YES if query has 'trace' option listed, NO if not.
// used at message reception time to generate trace information
Query-origination-address: // address of origin of query
Body-of-Query: // The original query goes here
Authentication-type: // Type of authentication used by queryer
Authentication-data: // Data for authentication
END FORWARDED-QUERY // This line must be used to terminate the body of the
// query
5.5 Query referral
SERVERS-TO-ASK:
Version-number: // version number of index software, used to insure
// compatibility
Query-id: // some query identifier so the client knows which query to
// issue to the following servers
Body-of-Query: // the original query goes here
Next-Servers: // A list of servers to ask next, either IP addresses or
// hostnames, one per line, cr/lf terminated
END SERVERS-TO-ASK
WNILS Working Group Whois++ Index Service Weider, et al.
6 References
Deutsch, et al. Architecture of the WHOIS++ service. August 1992.
Available by anonymous FTP as
ucdavis.edu://pub/archive/wnils/Architecture.Overview
7 Author's Addresses
Chris Weider
clw@merit.edu
Industrial Technology Institute, Pod G
2901 Hubbard Rd,
Ann Arbor, MI 48105
O: (313) 747-2730
F: (313) 747-3185
Jim Fullton
fullton@concert.net
MCNC Center for Communications
Post Office Box 12889
3021 Cornwallis Road
Research Triangle Park
North Carolina 27709-2889
O: 919-248-1499
F: 919-248-1405
Simon Spero
ses@sunsite.unc.edu
310 Wilson Library CB #3460
University of North Carolina
Chapel Hill, NC 27599-3460
O: (919) 962-9107
F: (919) 962-5604